Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
translated by 谷歌翻译
自我定位是一种基本功能,移动机器人导航系统集成到使用地图从一个点转移到另一点。因此,任何提高本地化精度的增强对于执行精致的灵活性任务至关重要。本文描述了一个新的位置,该位置使用Monte Carlo定位(MCL)算法维护几个颗粒人群,始终选择最佳的粒子作为系统的输出。作为新颖性,我们的工作包括一种多尺度匹配匹配算法,以创建新的MCL群体和一个确定最可靠的指标。它还贡献了最新的实现,从错误的估计或未知的初始位置增加了恢复时间。在与NAV2完全集成的模块中评估了所提出的方法,并与当前的最新自适应ACML溶液进行了比较,从而获得了良好的精度和恢复时间。
translated by 谷歌翻译
虚拟现实(VR)耳机提供了一种身临其境的立体视觉体验,但以阻止用户直接观察其物理环境的代价。传递技术旨在通过利用向外的摄像头来重建否则没有耳机的用户可以看到的图像来解决此限制。这本质上是一个实时视图综合挑战,因为传递摄像机不能与眼睛进行物理共同。现有的通行技术会遭受分散重建工件的注意力,这主要是由于缺乏准确的深度信息(尤其是对于近场和分离的物体),并且表现出有限的图像质量(例如,低分辨率和单色)。在本文中,我们提出了第一种学习的传递方法,并使用包含立体声对RGB摄像机的自定义VR耳机评估其性能。通过模拟和实验,我们证明了我们所学的传递方法与最先进的方法相比提供了卓越的图像质量,同时满足了实时的,透视透视的立体视图综合的严格VR要求,从而在广泛的视野上综合用于桌面连接的耳机。
translated by 谷歌翻译
The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy in few-shot settings.
translated by 谷歌翻译
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. In this work, we propose to capture each of these aspects by modeling the surgical scene with a disentangled latent scene graph representation, which we can then process using a graph neural network. Unlike previous approaches using graph representations, we explicitly encode in our graphs semantic information such as object locations and shapes, class probabilities and visual features. We also incorporate an auxiliary image reconstruction objective to help train the latent graph representations. We demonstrate the value of these components through comprehensive ablation studies and achieve state-of-the-art results for critical view of safety prediction across multiple experimental settings.
translated by 谷歌翻译
Nowadays, the applications of hydraulic systems are present in a wide variety of devices in both industrial and everyday environments. The implementation and usage of hydraulic systems have been well documented; however, today, this still faces a challenge, the integration of tools that allow more accurate information about the functioning and operation of these systems for proactive decision-making. In industrial applications, many sensors and methods exist to measure and determine the status of process variables (e.g., flow, pressure, force). Nevertheless, little has been done to have systems that can provide users with device-health information related to hydraulic devices integrated into the machinery. Implementing artificial intelligence (AI) technologies and machine learning (ML) models in hydraulic system components has been identified as a solution to the challenge many industries currently face: optimizing processes and carrying them out more safely and efficiently. This paper presents a solution for the characterization and estimation of anomalies in one of the most versatile and used devices in hydraulic systems, cylinders. AI and ML models were implemented to determine the current operating status of these hydraulic components and whether they are working correctly or if a failure mode or abnormal condition is present.
translated by 谷歌翻译
The aim of this work is to introduce MaRF, a novel framework able to synthesize the Martian environment using several collections of images from rover cameras. The idea is to generate a 3D scene of Mars' surface to address key challenges in planetary surface exploration such as: planetary geology, simulated navigation and shape analysis. Although there exist different methods to enable a 3D reconstruction of Mars' surface, they rely on classical computer graphics techniques that incur high amounts of computational resources during the reconstruction process, and have limitations with generalizing reconstructions to unseen scenes and adapting to new images coming from rover cameras. The proposed framework solves the aforementioned limitations by exploiting Neural Radiance Fields (NeRFs), a method that synthesize complex scenes by optimizing a continuous volumetric scene function using a sparse set of images. To speed up the learning process, we replaced the sparse set of rover images with their neural graphics primitives (NGPs), a set of vectors of fixed length that are learned to preserve the information of the original images in a significantly smaller size. In the experimental section, we demonstrate the environments created from actual Mars datasets captured by Curiosity rover, Perseverance rover and Ingenuity helicopter, all of which are available on the Planetary Data System (PDS).
translated by 谷歌翻译
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
translated by 谷歌翻译
Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
translated by 谷歌翻译
We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.
translated by 谷歌翻译